信息通过社交媒体平台的传播可以创造可能对弱势社区的环境和社会中某些群体的沉默。为了减轻此类情况,已经开发了几种模型来检测仇恨和冒犯性言论。由于在社交媒体平台中检测仇恨和冒犯性演讲可能会错误地将个人排除在社交媒体平台之外,从而减少信任,因此有必要创建可解释和可解释的模型。因此,我们基于在Twitter数据上培训的XGBOOST算法建立了一个可解释且可解释的高性能模型。对于不平衡的Twitter数据,XGBoost在仇恨言语检测上的表现优于LSTM,Autogluon和ULMFIT模型,F1得分为0.75,而0.38和0.37分别为0.37和0.38。当我们将数据放到三个单独的类别的大约5000个推文中时,XGBoost的性能优于LSTM,Autogluon和Ulmfit;仇恨言语检测的F1分别为0.79和0.69、0.77和0.66。 XGBOOST在下采样版本中的进攻性语音检测中的F1得分分别为0.83和0.88、0.82和0.79,XGBOOST的表现也比LSTM,Autogluon和Ulmfit更好。我们在XGBoost模型的输出上使用Shapley添加说明(SHAP),以使其与Black-Box模型相比,与LSTM,Autogluon和Ulmfit相比,它可以解释和解释。
translated by 谷歌翻译
With growing sophistication and volume of cyber attacks combined with complex network structures, it is becoming extremely difficult for security analysts to corroborate evidences to identify multistage campaigns on their network. This work develops HeAT (Heated Alert Triage): given a critical indicator of compromise (IoC), e.g., a severe IDS alert, HeAT produces a HeATed Attack Campaign (HAC) depicting the multistage activities that led up to the critical event. We define the concept of "Alert Episode Heat" to represent the analysts opinion of how much an event contributes to the attack campaign of the critical IoC given their knowledge of the network and security expertise. Leveraging a network-agnostic feature set, HeAT learns the essence of analyst's assessment of "HeAT" for a small set of IoC's, and applies the learned model to extract insightful attack campaigns for IoC's not seen before, even across networks by transferring what have been learned. We demonstrate the capabilities of HeAT with data collected in Collegiate Penetration Testing Competition (CPTC) and through collaboration with a real-world SOC. We developed HeAT-Gain metrics to demonstrate how analysts may assess and benefit from the extracted attack campaigns in comparison to common practices where IP addresses are used to corroborate evidences. Our results demonstrates the practical uses of HeAT by finding campaigns that span across diverse attack stages, remove a significant volume of irrelevant alerts, and achieve coherency to the analyst's original assessments.
translated by 谷歌翻译
Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation, but the quality bar for medical and clinical applications is high. Today, attempts to assess models' clinical knowledge typically rely on automated evaluations on limited benchmarks. There is no standard to evaluate model predictions and reasoning across a breadth of tasks. To address this, we present MultiMedQA, a benchmark combining six existing open question answering datasets spanning professional medical exams, research, and consumer queries; and HealthSearchQA, a new free-response dataset of medical questions searched online. We propose a framework for human evaluation of model answers along multiple axes including factuality, precision, possible harm, and bias. In addition, we evaluate PaLM (a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM, on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA, MedMCQA, PubMedQA, MMLU clinical topics), including 67.6% accuracy on MedQA (US Medical License Exam questions), surpassing prior state-of-the-art by over 17%. However, human evaluation reveals key gaps in Flan-PaLM responses. To resolve this we introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians. We show that comprehension, recall of knowledge, and medical reasoning improve with model scale and instruction prompt tuning, suggesting the potential utility of LLMs in medicine. Our human evaluations reveal important limitations of today's models, reinforcing the importance of both evaluation frameworks and method development in creating safe, helpful LLM models for clinical applications.
translated by 谷歌翻译
Prognostication for lung cancer, a leading cause of mortality, remains a complex task, as it needs to quantify the associations of risk factors and health events spanning a patient's entire life. One challenge is that an individual's disease course involves non-terminal (e.g., disease progression) and terminal (e.g., death) events, which form semi-competing relationships. Our motivation comes from the Boston Lung Cancer Study, a large lung cancer survival cohort, which investigates how risk factors influence a patient's disease trajectory. Following developments in the prediction of time-to-event outcomes with neural networks, deep learning has become a focal area for the development of risk prediction methods in survival analysis. However, limited work has been done to predict multi-state or semi-competing risk outcomes, where a patient may experience adverse events such as disease progression prior to death. We propose a novel neural expectation-maximization algorithm to bridge the gap between classical statistical approaches and machine learning. Our algorithm enables estimation of the non-parametric baseline hazards of each state transition, risk functions of predictors, and the degree of dependence among different transitions, via a multi-task deep neural network with transition-specific sub-architectures. We apply our method to the Boston Lung Cancer Study and investigate the impact of clinical and genetic predictors on disease progression and mortality.
translated by 谷歌翻译
Self-supervised learning (SSL) aims to produce useful feature representations without access to any human-labeled data annotations. Due to the success of recent SSL methods based on contrastive learning, such as SimCLR, this problem has gained popularity. Most current contrastive learning approaches append a parametrized projection head to the end of some backbone network to optimize the InfoNCE objective and then discard the learned projection head after training. This raises a fundamental question: Why is a learnable projection head required if we are to discard it after training? In this work, we first perform a systematic study on the behavior of SSL training focusing on the role of the projection head layers. By formulating the projection head as a parametric component for the InfoNCE objective rather than a part of the network, we present an alternative optimization scheme for training contrastive learning based SSL frameworks. Our experimental study on multiple image classification datasets demonstrates the effectiveness of the proposed approach over alternatives in the SSL literature.
translated by 谷歌翻译
We address the problem of unsupervised domain adaptation when the source domain differs from the target domain because of a shift in the distribution of a latent subgroup. When this subgroup confounds all observed data, neither covariate shift nor label shift assumptions apply. We show that the optimal target predictor can be non-parametrically identified with the help of concept and proxy variables available only in the source domain, and unlabeled data from the target. The identification results are constructive, immediately suggesting an algorithm for estimating the optimal predictor in the target. For continuous observations, when this algorithm becomes impractical, we propose a latent variable model specific to the data generation process at hand. We show how the approach degrades as the size of the shift changes, and verify that it outperforms both covariate and label shift adjustment.
translated by 谷歌翻译
Self-supervised representation learning follows a paradigm of withholding some part of the data and tasking the network to predict it from the remaining part. Towards this end, masking has emerged as a generic and powerful tool where content is withheld along the sequential dimension, e.g., spatial in images, temporal in audio, and syntactic in language. In this paper, we explore the orthogonal channel dimension for generic data augmentation. The data for each channel is quantized through a non-uniform quantizer, with the quantized value sampled randomly within randomly sampled quantization bins. From another perspective, quantization is analogous to channel-wise masking, as it removes the information within each bin, but preserves the information across bins. We apply the randomized quantization in conjunction with sequential augmentations on self-supervised contrastive models. This generic approach achieves results on par with modality-specific augmentation on vision tasks, and state-of-the-art results on 3D point clouds as well as on audio. We also demonstrate this method to be applicable for augmenting intermediate embeddings in a deep neural network on the comprehensive DABS benchmark which is comprised of various data modalities. Code is availabel at http://www.github.com/microsoft/random_quantize.
translated by 谷歌翻译
Recent work has reported that AI classifiers trained on audio recordings can accurately predict severe acute respiratory syndrome coronavirus 2 (SARSCoV2) infection status. Here, we undertake a large scale study of audio-based deep learning classifiers, as part of the UK governments pandemic response. We collect and analyse a dataset of audio recordings from 67,842 individuals with linked metadata, including reverse transcription polymerase chain reaction (PCR) test outcomes, of whom 23,514 tested positive for SARS CoV 2. Subjects were recruited via the UK governments National Health Service Test-and-Trace programme and the REal-time Assessment of Community Transmission (REACT) randomised surveillance survey. In an unadjusted analysis of our dataset AI classifiers predict SARS-CoV-2 infection status with high accuracy (Receiver Operating Characteristic Area Under the Curve (ROCAUC) 0.846 [0.838, 0.854]) consistent with the findings of previous studies. However, after matching on measured confounders, such as age, gender, and self reported symptoms, our classifiers performance is much weaker (ROC-AUC 0.619 [0.594, 0.644]). Upon quantifying the utility of audio based classifiers in practical settings, we find them to be outperformed by simple predictive scores based on user reported symptoms.
translated by 谷歌翻译
Since early in the coronavirus disease 2019 (COVID-19) pandemic, there has been interest in using artificial intelligence methods to predict COVID-19 infection status based on vocal audio signals, for example cough recordings. However, existing studies have limitations in terms of data collection and of the assessment of the performances of the proposed predictive models. This paper rigorously assesses state-of-the-art machine learning techniques used to predict COVID-19 infection status based on vocal audio signals, using a dataset collected by the UK Health Security Agency. This dataset includes acoustic recordings and extensive study participant meta-data. We provide guidelines on testing the performance of methods to classify COVID-19 infection status based on acoustic features and we discuss how these can be extended more generally to the development and assessment of predictive methods based on public health datasets.
translated by 谷歌翻译
State-of-the-art language models are often accurate on many question-answering benchmarks with well-defined questions. Yet, in real settings questions are often unanswerable without asking the user for clarifying information. We show that current SotA models often do not ask the user for clarification when presented with imprecise questions and instead provide incorrect answers or "hallucinate". To address this, we introduce CLAM, a framework that first uses the model to detect ambiguous questions, and if an ambiguous question is detected, prompts the model to ask the user for clarification. Furthermore, we show how to construct a scalable and cost-effective automatic evaluation protocol using an oracle language model with privileged information to provide clarifying information. We show that our method achieves a 20.15 percentage point accuracy improvement over SotA on a novel ambiguous question-answering answering data set derived from TriviaQA.
translated by 谷歌翻译